In machine translation applications, the encoder and decoder are typically
Generative Adversarial Networks (GANs)
Recurrent Neural Networks (Typically vanilla RNN, LSTM, or GRU)
Mentats
Word Embeddings
What's a more reasonable embedding size for a real-world application?
4
200
6,000
What are the steps that require calculating an attention vector in a seq2seq model with attention?
Every time step in the model (both encoder and decoder)
Every time step in the encoder only
Every time step in the decoder only
Next Concept